Experiments In Constructing A Corpus Of Discourse Trees
نویسندگان
چکیده
We discuss a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We also propose a statistical method for measuring agreement of hierarchical structure annotations and we discuss its strengths and weaknesses. The statistical measure we use suggests that annotators can achieve good levels of agreement on the task of determining the high-level, rhetorical structure of texts. Our empirical experiments also suggest that building discourse parsers that incrementally derive correct rhetorical structures of unrestricted texts without applying any form of backtracking is unfea-
منابع مشابه
Experiments in Constructing a Corpus of Discourse Trees: Problems, Annotation Choices, Issues
We present a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We focus on presenting the difficulties that we faced in designing a discourse annotation manual and on discussing the choices that we made in order to address these difficulties. We report reliability results concerning our agreement on building the rhetorical structure of 90 texts of three genres: 3...
متن کامل1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means
We present discourse annotation work aimed at constructing a parallel corpus of Rhetorical Structure trees for a collection of Japanese texts and their corresponding English translations. We discuss implications of our empirical ndings for the task of text planning in the context of implementing multilingual natural language generation systems.
متن کاملContrasting the Automatic Identification of Two Discourse Markers in Multiparty Dialogues
The identification of occurrences of like and well that serve as discourse markers (DMs) is a classification problem which is studied here on a corpus of dialogue transcripts with more than 4,000 occurrences of each item. Decision trees using item-specific lexical, prosodic, positional and sociolinguistic features are trained using the C4.5 method. The results demonstrate improvement over past ...
متن کاملA Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles
There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...
متن کاملLinguistic Devices of Identity Representation in English Political Discourse with a Focus on Personal Pronouns: Power and Solidarity
The present study was aimed at exploring the use of pronominal reference for identity representation in terms of power and solidarity in English political discourse. The investigation was based on a corpus of four political interviews and debates amounting 26,500 words. The analysis was both qualitative and quantitative. In the qualitative analysis, a discourse-analytic approach was used to fin...
متن کامل